Development and comparison of two approaches for visual speech analysis with application to voice activity detection
نویسندگان
چکیده
In this paper we present two novel methods for visual voice activity detection (V-VAD) which exploit the bimodality of speech (i.e. the coherence between speaker’s lips and the resulting speech). The first method uses appearance parameters of a speaker’s lips, obtained from an active appearance model (AAM). An HMM then dynamically models the change in appearance over time. The second method uses a retinal filter on the region of the lips to extract the required parameter. A corpus of a single speaker is applied to each method in turn, where each method is used to classify voice activity as speech or non speech. The efficiency of each method is evaluated individually using receiver operating characteristics and their respective performances are then compared and discussed. Both methods achieve a high correct silence detection rate for a small false detection rate.
منابع مشابه
A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)
Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملSpeech and Reading Disorders Screening, and Problems in Structure and Function of Articulation Organs in Children in Mashhad City, Iran
Background and Objectives: Investigating the prevalence of speech and language disorders and the contributing factors can help determine the best treatment options suited to the needs of these patients. So far, no comprehensive study has been conducted on screening speech and reading disorders and problems in the structure and function of articulation organs (PSFAOs) in children in Mashhad City...
متن کاملImproving Voice Outcomes After Injury to the Recurrent Laryngeal Nerve
Objectives: The present study aimed to determine the voice outcomes before and after the administration of voice therapy in patients who suffered an injury to the recurrent laryngeal nerve after undergoing thyroidectomy. Methods: The sample consisted of 26 patients (2 males and 24 females) aged between 18 and 80 years (m=55±12) who experienced injury to the recurrent laryngeal nerve fol...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007